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© Display system capable of accepting user commands by use of voice and gesture Inputs. 



© A method of accepting multimedia operation 
commands wherein, while pointing to either of a 
display object or a display position on a display 
screen of a graphics display system through a point- 
ing input device, a user commands the graphics 
display system to cause an event on a graphics 
display, through a voice input device; comprising a 
first step of allowing the user to perform the pointing 
gesture so as to enter a string of coordinate points 
which surround one area for either of the display 
object and any desired display position; a second 
step of allowing the- user to give the voice command 
together with the pointing gesture; a third step of 
recognizing a command content of the voice com- 
mand by a speech recognition process in response 
to the voice command; a fourth step of recognizing a 
command content of the pointing gesture in accor- 
dance -with- the- recognized result ot the third- step; 
and a fifth step of executing the event on the graph- 
ics display in accordance with the command con- 
tents of the voice command and the pointing ges- 
ture. Thus, the method provides a man-machine 
interface which utilizes the plurality of media of the 
voice and the pointing gesture, which offers a high 
operabiiity to the user, and with which an illustration 



etc. can be easily edited. 
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BACKGROUND OF THE INVENTION 

The- present invention relates to- a user inter- 
face method for an illustration edit system etc. 
which is installed in OA (office automation) pro- 
ducts such as a personal computer, a workstation 
or a word processor. It provides a method of com- 
manding a display system to cause events on a 
graphics display by the use of information of media 
convenient to a user. 

In, for example, a graphics edit system, when 
newly entering a pattern anew, the following pro- 
cedured steps have heretofore been required as 
stated in "MAC LIFE" (No. 25, 9, published by 
Kawade Publishers}: First, the pattern tobe-entered 
is selected from within a graphics pattern menu 
displayed on a display screen beforehand, by the 
use of a pointing device such as mouse. Subse- 
quently, using the pointing device, the contour line 
of the selected pattern is pointed to, and the pat- 
tern is dragged to determine its input position. 
Further, in the case of determining the size of the 
pattern, the contour line of the pattern is dragged 
with the pointing device so as to adjust the size 
thereof. 

A pattern can be designated by the vocaliza- 
tion of a demonstrative pronoun with an attendant 
gesture, as disclosed in B. A. Bolt: "Put that there." 
(Computer Graphics, 14, 3, 1980). Another method 
of designating a pattern is giving a name to an 
object drawn on a display screen, before the start 
of a graphics edit mode, and using the name for a 
voice command thenceforth. 

By the way, the assignee of the present ap- 
plication has previously filed Japanese Patent Ap- 
plication No. Hei4-239832 (1992) as to improve- 
ments in a method wherein a voice command with 
an attendant gesture is issued in order to handle a 
subject displayed on a display screen. The patent 
application discloses a technique in which, among 
command content candidates obtained through the 
recognition process of the voice command, speci- 
fied candidates are excluded from the subject to- 
be-recognized in accordance with the number of 
pointing gestures attendant upon the voice com- 
mand. 

As stated above, with the prior art, when the 
pattern is to be newly entered in the graphics edit 
mode, any of standard patterns is first selected to 
display the desired pattern on the screen. There- 
after, the size of the pattern needs to be des- 
ignated. That is, the size cannot be designated 
simultaneously with the input of the pattern. 

Moreover, the pattern already entered cannot 
have its size designated by the gesture. 

Furthermore, in a case where the user wishes 
for a specified illustration, he/she must draw the 
illustration through, e. g., the combination of basic 



patterns by himself/herself. It is also troublesome 
that the naming job is required for designating the 
drawn object by only- the voice input. 

5 SUMMARY OF THE INVENTION 

An object of the present invention is to provide 
a method of and system for accepting multimedia 
operation commands, which utilize a plurality of 
70 media such as a voice and a pointing gesture, 
which offer a high operability to a user, and with 
which an illustration etc. can be easily entered and 
edited. 

In one aspect of performance of the present 
75 invention, there is -provided a method- of accepting 
multimedia operation commands wherein, while 
pointing to either of a display object or a display 
position on a display screen of a graphics display 
system through a pointing input device, a user 
20 commands the graphics display system to cause 
an event on a graphics display, through a voice 
input device; comprising: 

the first step of allowing the user to perform 
the pointing gesture so as to enter a string of 
25 coordinate points which surround one area for ei- 
ther of the display object or any desired display 
position; 

the second step of allowing the user to give 
the voice command together with the pointing ges- 
30 ture; 

the third step of recognizing a command con- 
tent of the voice command by a speech recognition 
process in response to the voice command; 

the fourth step of recognizing a command con- 
35 tent of the pointing gesture in accordance with the 
recognized result of the third step; and 

the fifth step of executing the event on the 
graphics display in accordance with the command 
contents of the voice command and the pointing 
40 gesture. 

In another aspect of performance of the 
present invention, there is provided a display sys- 
tem which is commanded by a user to cause an 
event concerning a display object on a graphics 
45 display, by the use of a voice and a pointing 
gesture; comprising: 

a pointing input device for entering a string of 
coordinate points which surround one area for ei- 
ther of the display object on the graphics display or 
50 a display position of the display object; 

a pointing area table which stores therein the 
string of coordinate points entered by the pointing 
input device; 

a bit map data memory for storing therein bit 
55 map data of various display parts that constitute 
the display object, and standard maximum widths 
and standard maximum lengths of the display 
parts; 
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a drawing table which stores therein identifiers 
of the display parts selected from within the bit 
map data memory and displayed on the graphics 
display, width wise and lengthwise scale-up/down 
ratios of the display parts relative to the standard 
maximum widths and lengths on the graphics dis- 
play, and positional information of the display parts; 

a display parts dictionary which holds therein 
speech-recognizable names of - the individual dis- 
play parts stored in the bit map data memory; 

a voice command input device for entering the 
voice command of the user; 

a speech recognition device for recognizing the 
voice command entered by the voice command 
input device,- -with -reference -to- the- display -parts 
dictionary; 

a display parts extraction device for extracting 
the display parts on the graphics display as des- 
ignated on the basis of the string of coordinate 
points in the pointing area table; 

a target point calculator for calculating a target 
point designated on the basis of the string of 
coordinate points in the pointing area table; 

a scale-up/down ratio calculator for calculating 
the widthwise and lengthwise scale-up/down ratio 
information of the display parts on the basis of the 
string of coordinate points in the pointing, area 
table; and 

a controller for selectively activating at least 
one of the display parts extraction device, the 
target point calculator and the scale-up/down ratio 
calculator in accordance with a result of the speech 
recognition, and for rewriting the drawing table on 
the basis of a result of the activation. 

According to the multimedia operation com- 
mand accepting method and system of the present 
invention, the user designates the display object or 
the display position to-be-entered or edited through 
the voice input, and he/she simultaneously des- 
ignates the display object, the input position there- 
of, the size thereof or the target position thereof 
through the pointing gesture, whereby the user is 
permitted to display the subject to-be-handled at 
the designated position on the display screen or 
with the designated size. By way of example, when 
the display object is to be scaled up or down 
(enlarged or reduced in size), the pointing gesture 
of the desired size may be performed on the 
display screen together with the voice command. 
Besides, when the new input is to be entered, the 
input subject to-be-handled is vocally designated, 
and the pointing gesture of the desired size is also 
performed at the desired position, whereby the size 
can be designated simultaneously with the entry of 
the subject. Thus, the operating steps are simpli- 
fied, and the operability is enhanced. 

Moreover, owing to the part illustrations dic- 
tionary which stores therein the necessary parts for 



drawing the illustration, and to the function by 
which the designated part is displayed on the dis- 
play-screen when the part name is given by the 
voice input, the user is freed from the necessity to 

s draw the part to-be-displayed through, e. g., the 
combination of basic symbols by himself/herself, 
and he/she is permitted to display the part on the 
display screen with ease. 

Also, a section of the part displayed on the 

io display screen can be designated by the use of 
voice and gesture inputs. For example, a window of 
a car displayed on the screen can be designated 
by using a voice "window" together with a gesture 
pointing to the window to be designated, such that 

75 an event such-as- deleting r copy T etc. -is -performed. 

The pointing gestures become more versatile 
and realize more natural inputs by calculating the 
difference value between the input times of the 
respectively adjoining individual coordinate points 

20 which constitute the entered strings of coordinate 
points is calculated, thereby sensing, the termina- 
tion of one pointing gesture, and judging the func- 
tions of the sensed individual gestures in accor- 
dance with the command contents of the voice 

25 inputs and the sequence of the information inputs. 

The pointing gestures become still more versa- 
tile and realize still more natural inputs, by rec- 
ognizing an object designated by the user such 
that, in the case where the entered string of coordi- 

30 nate points indicate the display object and where 
the area formed by the string of coordinate points 
share the common areas on the display screen 
with the areas formed by the displayed objects, the 
displayed object which has the largest common 

35 area with the area formed by the string of coordi- 
nate points is determined as the designated sub- 
ject to-be-handled, and in the case where the area 
formed by the string of coordinate points does not 
have the common areas on the display screen with 

40 the areas formed by the displayed objects, by 
determining the displayed object which has the 
central point nearest the coordinate point indicated 
by the mean value of the maximum and minimum 
X values of the individual coordinate points of the 

45 entered string and the mean value of the maximum 
and minimum Y values thereof as the designated 
subject to-be-handled. 

The user is permitted to designate the size of 
the display object or the like quickly and easily, in 

so the case where the scale-up or down (enlargement 
or reduction) of the display object or the size of the 
new input object is designated, by calculating the 
ratio a between the difference value of the mini- 
mum and maximum values in the X coordinate 

55 values of the information of the entered string of 
coordinate points and the standard maximum 
length of the display object, and the ratio & be- 
tween the difference value of the minimum and 
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maximum values in the Y coordinate values and 
the standard maximum width of the display object, 
whereupon the calculated ratios a and j8 are used 
for interpolating the cells of the bit map so as to 
enlarge the size of the picture or for thinning out 
the cells so as to reduce the size of the picture. 

BRIEF DESCRIPTION OF THE DRAWINGS 



Fig. 1 is a system block diagram of a WS 
(workstation) showing an embodiment of the 
present invention; 

Fig. 2 is a view showing an example of an 

illustration editing picture frame; 

Fig^^-is-a- diagram showing an- example- of the 

data structure of bit map data (18); 

Fig. 4 is a diagram showing an example of the 

data structure of a drawing table; 

Fig. 5 is a view showing an embodiment in 

which a subject to-be-handled is designated by 

a voice and a pointing gesture on an illustration 

editing picture frame; 

Fig. 6 is a view showing an embodiment in 
which the scale-up (enlargement) of a subject 
to-be-handled is designated by a voice and a 
pointing gesture on an illustration editing picture 
frame; 

Rg. 7 is a module block diagram showing an 
example of an acoustic recognition program 
(13); 

Rg. 8 is a diagram showing an example of the 

data structure of a pointing area table; 

Rg. 9 is a diagram showing an example of an 

information integration program (15); 

Rg. 10 is a diagram showing an example of a 

drawing table which has been rewritten by the 

designation of the subject scale-up; 

Rg. 11 is a view showing an example of an 

illustration editing picture frame which has been 

re-displayed by the designation of the subject 

scale-up; 

Rg. 12 is a view showing an embodiment in 
which a new input is designated by a voice and 
a pointing gesture on an illustration editing pic- 
ture frame; 

Rg. 13 is a diagram showing an example of the 
data structure of a part illustrations dictionary 
(17); 

Fig. 14 is a diagram showing an example of a 
drawing table which has been created by the 
designation of the new input; 
Rg. 15 is a view showing an example of an 
illustration editing picture frame which has been 
re-displayed by the designation of the new in- 
put; 

Rg. 16 is a view showing an embodiment in 
which the movement of a subject is designated 
by a voice and a pointing gesture on an illustra- 



tion editing picture frame; 
Rg. 17 is a diagram showing an example of the 
data structure of a drawing table which has been 
rewritten by the designation of the subject 

s movement; 

Rg. 18 is a view showing an example of an 
illustration editing picture frame which has been 
re-displayed by the designation of the subject 
movement; 

70 Rg. 19 is a diagram showing an example of the 
data structure of a command dictionary (19); 
Rg. 20 is a flow chart exemplifying the flow of 
processing after the start of the acoustic rec- 
ognition program (13); 

75 Rg. 21 is a flow chart- exemplifying the flow of 
processing after the start of a pointing area read 
program (14); 

Rg. 22 is a flow chart showing an example of 
the flow of processing after the start of the 

20 information integration program (15); 

Rg. 23 is a flow chart showing an example of 
the flow of processing after the start of a com- 
mand extraction program (1502); 
Rg. 24 is a flow chart showing an example of 

25 the flow of processing after the start of an object 
extraction program (1501); 
Rg. 25 is a flow chart showing an example of 
the flow of processing after the start of a scale- 
up/down ratio calculation program (1503); and 

30 Rg. 26 is a flow chart showing an example of 
the flow of processing after the start of a target 
point calculation program (1 504). 

PREFERRED EMBODIMENTS OF THE INVEN- 
35 TION 

Now, the embodiments of the present invention 
will be described in detail with reference to the 
accompanying drawings. 

40 Fig. 1 shows the architecture of a system 
which realizes a multimedia information entering 
method according to the present invention. Here in 
the description, an illustration edit system is taken 
as an example. 

45 The system shown in Rg. 1 includes an in- 
formation processor 1, main storage 2, a panel 
controller 3, a display device 4, a touch panel 5, a 
display controller 6, an A/D (analog-to-digital) con- 
verter 7, a microphone 8, a disk 9 and a bit map 

so memory 4002. The disk 9 stores therein a system 
program 11, an illustration edit program 12, an 
acoustic recognition program 13, a pointing area 
read program 14, an information integration pro- 
gram 15, acoustic standard pattern data 16, a part 

55 illustrations dictionary 17, bit map data 18 and a 
command dictionary 19. The stored contents 11 - 
19 of the disk 9 are loaded in the main storage 2 
when the system is started up. Besides, the main 



4 



7 



BP 0 594 129 A2 



8 



storage 2 stores therein tables 204 and 4000 which 
will be explained later. A buffer area W to be 
explained later is also secured in the main storage 
2. Display data in pixel units corresponding to the 
screen of the display device 4 are stored in the bit 
map memory 4002. 

As shown in Fig. 7, the acoustic recognition 
program 13 is configured of a voice input program 
1 300 and a feature extraction program 1 301 . 

As shown in Fig. 9, the information integration 
program 15 is configured of the modules of a 
syntax check program 1500, an object extraction 
program 1501, a command extraction program 
1502, a scale-up/down ratio calculation program 
1503- and- a -target- point- position -calculation pro- 
gram 1504. 

Fig. 2 shows an example of an illustration ed- 
iting picture frame which is displayed on the dis- 
play device 4 through the illustration edit program 
12 loaded in the main storage 2. In the figure, a car 
B and a rectangle C are drawn in a graphics mode 
on the basis of the drawing table (4000 in Fig. 4) 
stored in the main storage 2, by the illustration edit 
program 12. 

Referring to Fig. 3, the bit map data 18 of the 
standard symbols of part illustrations such as the 
"car B" and "rectangle C" are prepared for the 
respective part illustrations. In the figure, numeral 
301 denotes the identification No. of the bit map of 
the standard symbol of each part illustration, nu- 
meral 302 the name of the corresponding illustra- 
tion part, and numeral 303 the data Nos. of the 
individual pixel data of the corresponding part il- 
lustration. In addition, numeral 304 represents the 
difference value between the minimum and maxi- 
mum X-axial values among the nonzero coordinate 
values of the pixel data in the case where the bit 
map data 18 of the corresponding part illustration 
are expanded on the illustration edit picture frame 
(the bit map memory 4002) under the condition 
that pixel No. 1. namely bit No. 1, is expanded at a 
position (0. 0) on the picture frame. The difference 
value 304 shall be termed the "standard maximum 
length XD". Likewise, numeral 305 represents the 
difference value between the minimum and maxi- 
mum Y-axial values among the nonzero coordinate 
values of the pixel data in the case where the bit 
map data 18 of the corresponding part illustration 
are expanded on the illustration edit picture frame 
under the condition that pixel No. 1 , namely bit No. 
1, is expanded at the position (0, 0) on the picture 
frame. The difference value 305 shall be termed 
the "standard maximum width YD". In the case of 
the part illustration "car" of bit map No. 1, the 
standard maximum length XD is expressed as 
"200", and the standard maximum width YD as 
"50". 



The drawing table 4000 shown in Fig. 4 cor- 
responds to the example of Fig. 2. Regarding the 
part illustration "ear B", the standard bit -map- No. 
"1" indicated in Fig. 3, a widthwise scale-up/down 

s ratio "1.0", a lengthwise scale-up/down ratio "1.0" 
and central coordinates "(350, 330)" are stored in 
the row for part No. 1 . On the other hand, regard- 
ing the part illustration "rectangle C", standard bit 
map No. "12" indicated in Fig.- 3, a widthwise 

10 scale-up/down ratio "1.0", a lengthwise scale- 
up/down ratio "1.0" and central coordinates "(300, 
80)" are stored in the row for part No. 2. The 
maximum value of part Nos. (the largest part il- 
lustration number) is stored in the buffer area W 

?5 (not -shown>-of-the -main -storage 2. .The central 
coordinates of each part illustration are indicated 
by the mean value of the minimum and maximum 
values on the X-axis of the illustration editing pic- 
ture frame and that of the minimum and maximum 

20 values on the Y-axis thereof. 

It is assumed here that, as shown in Figs. 5 
and 6, the user of the system enters the intention 
of scaling up (enlarging) the part "car B" through 
voices and gestures. First, when the illustration edit 

25 program 12 is executed in the information proces- 
sor 1, the acoustic recognition program 13 loaded 
in the main storage 2 is. started. An example of the 
flow of the consequent processing is shown in Fig. 
20. Once the acoustic recognition program 13 has 

30 been started, the voice input program 1 300 is also 
started. On this occasion, as shown in Fig. 5 by 
way of example, the user points to a point D on the 
touch panel 5 with a finger tip A, a pen or the like, 
thereby designating the subject tc-be-handled. 

35 Subsequently, as shown in Fig. 6, the user draws a 
curve D\ thereby designating a size. While per- 
forming the gestures shown in Figs. 5 and 6, the 
user enters the voice command "ENLARGE THIS 
ABOUT THIS SIZE" ("SCALE UP THIS TO ABOUT 

40 THIS SIZE" or "SCALE UP THIS ABOUT THIS 
MUCH"), more concretely, the voice command S1 
in the state of Fig. 5 and the voice command S2 in 
the state of Fig. 6 are input by means of the 
microphone 8 substantially simultaneously with the 

45 gestures (step S101 in Fig. 20). The voice inputs 
S1 and S2 are applied to the A/D converter 7 by 
the voice input program 1300 and are then con- 
verted into a digital signal, which is sent to the 
main storage 2 (S102). Subsequently, the feature 

so extraction program 1301 is started, and the digital 
signal is transformed into a time series of LPC 
cepstrum coefficients as a feature vector at a frame 
cycle of 10 [msec] (S103). The time series of LPC 
cepstrum coefficients is stated in, for example, 

55 Saitoh and Nakata: "Fundamentals of Voice In- 
formation Processing" (1981, The Ohm-Sha, Ltd.). 
Here, the frame cycle is not restricted to 10 
[msec.], but it can be set at any desired time 
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period such as 20 [msec.] or 30 [msec.]. Also, the 
feature vector is not restricted to the LPC cepstrum 
coefficients, but it can be replaced with; e. g., the 
output of a band-pass filter. 

Meanwhile, the pointing area read program 14 
loaded in the main storage 2 is started by the 
information processor 1 at the same time that the 
acoustic recognition program 13 is started by the 
above method. Thus ; the processing of the pro- 
gram 14 proceeds concurrently. An example of the 
flow of this processing is shown in Fig. 21. The 
pointing area table 204 which is utilized in the 
processing of Fig. 21, is shown in Fig. 8. This table 
204 is made up of the columns of coordinate No. 
indicated at- numeral 200, an-input time £04-, --an- x-- 
coordinate 202 and a y-coordinate 203. Thus, the 
x-coordinate and y-coordinate data and the input 
times thereof are stored in this table 204 from 
coordinate No. 1 in the order in which the data are 
entered, successively at fixed time intervals. In the 
example mentioned here, the coordinate data are 
stored at the time intervals of 100 [msec.]. A time 
interval of 1000 [msec] is involved between coordi- 
nate No. 4 and coordinate No. 5, and it is recog- 
nized as corresponding to the boundary between 
the first and second pointing gestures. 

In the processing of Fig. 21, variables P0 and 
Q which are set in the buffer area W of the main 
storage 2 are first reset to zeros {steps S201 and 
S202), respectively. While the user is in touch with 
the touch panel 5 with, e. g., the finger tip or the 
pen (S203), the pointing area read program 14 
accepts touched coordinates through the panel 
controller 3 at fixed time intervals which are very 
short (S204). The program 14 increments the vari- 
able P0 each time it accepts the coordinates 
(S205). Further, the program 14 writes the ac- 
cepted x-coordinate into an array X[P0] in the 
pointing area table 204 (Fig. 8) of the main storage 
2, the accepted y-coordinate into an array Y[P0], 
and the input time of the coordinates into an array 
T[P0] (S205). When a certain predetermined time 
period T G has lapsed since the release of the finger 
tip, the pen or the like from the touch panel 5, the 
program 14 terminates the writing operation 
(S203). The predetermined time period T G is set at 
a duration long enough to recognize the boundary 
between individual operation commands such as 
scale-up/down, movement, new entry and copy. 
After the termination of the above writing operation, 
in order to determine the number of pointing ges- 
tures, the program 1 4 checks if the difference value 
(T[i+ij - T[j]) of the input times of the adjoining 
individual coordinate Nos., for example, coordinate 
No. i and coordinate No. (i + 1) stored in the 
pointing area table 204 is equal to or greater than a 
certain predetermined value T g (S206). When this 
condition is encountered anywhere, the program 14 



increments the variable Q and writes the coordinate 
No. i into an array Z in the main storage 2 (in the 
form of Z[Q] •■= i) (S207). Such steps are iterated 
until the value i becomes greater than the value P0 

s (S208). The predetermined value T g is a time inter- 
val which is long enough to recognize the boundary 
between gestures in a single operation command, 
and which is shorter than the time period T 0 . The 
array Z(Q) corresponds to the last one of a series 

w of coordinate Nos. as has been created by the Qth 
gesture. This array Z(Q) is utilized at a step $501 
in Fig. 24 which will be referred to later. 

When the user's pointing and vocalization have 
ended, the information integration program 15 load- 
is ed-in the main storage- 2 is started. Fig. 22 shows 
an example of the flow of processing after the start 
of the information integration program 15. 

Before the description of the processing in Fig. 
22, an example of the data structure of the com- 

20 mand dictionary 19 which is utilized in this pro- 
cessing will be explained with reference to Fig. 19. 
The command dictionary 19 is configured of the 
columns of a verb 191, a command 192 signified 
by the corresponding verb, command No. indicated 

25 at numeral 193, word attributes 194, and word 
attribute No. indicated at numeral 195. The verb 
may include "ENLARGE", "MOVE", "DRAW", 
"SCALE", "REDUCE", etc., although not all shown 
in Fig. 19. The word attributes 194 include "Object 

30 name" which indicates a display object, "Size" 
which indicates the size of the display object in a 
scale-up/down operation, and "Position" which in- 
dicates the position of the display object, etc. The 
items of the "Object name" are demonstrative pro- 

35 nouns such as "THIS" and "THAT", and other 
words contained in the illustration parts dictionary 
17. Expressions such as "THIS SIZE" and "THIS 
MUCH" correspond to the "Size". Demonstrative 
adverbs such as "HERE" and "THERE" corre- 

40 spond to the "Position". Each of the word attributes 
is endowed with the No. peculiar thereto (as in- 
dicated at numeral 195). 

Referring now to Fig. 22, an array S in the 
main storage 2 is reset to zero (step S301), and the 

45 syntax check program 1500 is started (S302). Sub- 
sequently, the matching between the feature vector 
obtained before and the acoustic standard pattern 
data 16 is performed by, for example, a method 
stated in Kitahara et al.: "Study on Colloquial sen- 

50 tence accepting method in Information retrieval 
system based on Voice inputs" (3-5-7. 1991, The 
Japan Institute of Acoustics), with the result that the 
input voices are transformed into a character string 
(S303). In the foregoing example, the character 

55 string obtained on the basis of the input voices 
becomes "ENLARGE THIS ABOUT THIS SIZE" 
("SCALE UP THIS TO ABOUT THIS SIZE" or 
"SCALE UP THIS ABOUT THIS MUCH"). Further, 
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the character string is subjected to a morphemic 
analysis by a conventional method (S304). As a 
result, morphemic information such as "ENLARGE" 
(verb phrase), "THIS" (pronoun, object name) and 
"THIS SIZE" (adverb phrase, size) are obtained. 

At the next step, the command extraction pro- 
gram 1502, the processing of which is exemplified 
in Fig. 23, is started (S305). The verb column 191 
in the command dictionary 19 explained before is 
referred to, and the verb phrase "ENLARGE" of the 
morphemic information is collated therewith (S401 
in Fig. 23), whereby "ENLARGE" 1911 (in the verb 
column 191 in Fig. 19) is selected. As a result, 
command No. "1" (indicated at numeral 1941 in 
the command- No, -column 193) corresponding- to 
the command content "ENLARGE" (indicated in 
the command column 192) is stored in the array S 
held in the main storage 2, in the form of S[0] = 1 
(S306, S402). Besides, the word attributes of the 
morphemic information are respectively collated 
with the word attribute column 1 94 of the command 
dictionary 19, thereby deriving the word attribute 
Nos. thereof from the word attribute No. column 
195 (S403). In the example here, the word at- 
tributes "Object name" and "Size" are selected, 
and the respectively corresponding word attribute 
Nos. "11" and "12" are stored in the array ele- 
ments S[1] - S[m] of the array S (where "m" 
denotes the number of word attributes) in the order 
in which they are selected, in the forms of S[1] = 
11 and S[2] = 12 (S306, S404). 
Subsequently, the content of the command is iden- 
tified in the light of the command No. stored in the 
array element S[0J. In the example here, "EN- 
LARGE" is identified. Further, the array element S- 
[1] is first checked in accordance with the word 
attribute No. input order of the array elements S[1 ] 
- S[m], and it is decided as indicating the word 
attribute No. "11", namely, the word attribute "Ob- 
ject name" ($307). Accordingly, the object extrac- 
tion program 1501 is activated or started for the 
first pointing gesture shown in Fig. 5 (S308). 

The object extraction program 1501 shown in 
Fig. 24 calculates the minimum value XMnO and 
maximum value XMxO within X coordinate values 
stored in the elements X[1] - X[Z(1)] of an array X 
in the main storage 2 and also the minimum value 
YMnO and maximum value YMxO within Y coordi- 
nate values stored in the elements Y[1] - Y[Z(1)] of 
an array Y, and it calculates all coordinate values 
expressed as (Xq, Yq) within a range defined by 
XMnO S Xq S XMxO and YMnO £ Yq £ YMxO 
(S501). Incidentally, symbols 1 - Z(1) correspond 
to the string of coordinate Nos. of the first gesture. 
Besides, as to each of the part Nos. 1 - W stored 
in the drawing table 4000, the program 1501 cal- 
culates the minimum X value XMn, maximum X 
value XMx, minimum Y value YMn and maximum Y 



value YMx of each screen display object from the 
central coordinates and scale-up/down ratios of the 
drawing table 4000 and the bit map data 18, and it 
calculates all coordinate values expressed as (Xz, 

5 Yz) within a range defined by XMn £ Xz £ XMx and 
YMn £ Yz £ YMx. Further, the program 1501 col- 
lates all the coordinate values of the individual 
screen display objects as represented by (Xq, Yq), 
with all the coordinate vakjes represented by (Xz, 

70 Yz) (S502). The numbers of coordinate points hav- 
ing agreed in the collation are respectively stored 
in the elements COM[1] - COM[W] of an array 
COM in the main storage 2. Herein, if the element 
COM[j] of the array COM has the maximum value 

75 among the elements -COM[t}---COM[W] thereof, 
the \th display object is decided as the designated 
subject to-be-handled (S503). On the other hand, in 
a case where no coordinates have agreed in the 
collation, the display object which has central co- 

20 ordinates nearest coordinates (Xnt, Ynt) indicated 
by the mean value of the X values XMnO and XMxO 
and the mean value of the Y values YMnO and 
YMxO is decided as the designated subject (S309, 
S504, S505). In the example of the pointing area 

25 table 204 shown in Fig. 8, the distance or length 
Ln1 4= 233 between the central coordinates (350, 
330) of the "car B" and the coordinates (Xnt, Ynt) 
= (550, 450) at the coordinate Nos. "1" ~ "4" 
corresponding to the first pointing gesture shown in 

30 Fig. 5 is smaller than the distance Ln2 =5= 447 
between the central coordinates (300, 80) of the 
"rectangle C" and the coordinates (Xnt, Ynt) = 
(550, 450) (that is, Ln1 S Ln2 holds). Therefore, the 
"car B" is decided as the designated subject. 

35 Subsequently, since the array element S[2] in- 
dicates the word attribute No. "12", namely, the 
word attribute "Size" (step S321 in Fig. 22), the 
scale-up/down ratio calculation program 1503 is 
started for the second pointing gesture shown in 

40 Fig. 6 (S310). By the way, the step S321 may well 
decide whether the command content is "Scale- 
up" or "Movement", not the word attribute S[2]. As 
shown in Fig. 25, the scale-up/down ratio calcula- 
tion program 1503 calculates the ratio a of the 

45 difference value (XMx1 - XMn1) between the mini- 
mum value XMn1 and maximum value XMx1 within 
the X coordinate values stored in the elements X[Z- 
(1) + 1] ~ X[Z(2)] of the array X in the main storage 
2, relative to the standard maximum length XD of 

50 the designated subject, and the ratio £ of the 
difference value (YMx1 - YMn1) between the mini- 
mum value YMn1 and maximum value YMx1 within 
the Y coordinate values stored in the elements Y[Z- 
(1) + 1] - Y[Z(2)] of the array Y in the main storage 

55 2, relative to the standard maximum width YD of 
the designated subject (steps S601 and S602 in 
Fig. 25. and step S311 in Fig. 22). Incidentally, 
symbols Z(1 ) + 1 ~ Z(2) correspond to the string of 
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coordinate Nos. based on the second gesture. 

Referring back to Fig. 22, the cells of the bit 
map are - interpolated or thinned out in conformity 
with the calculated ratios a = 2.0 and £ = 2.0, to 
thereby scale the picture up or down (to enlarge or 
reduce the size of the picture) (S312). This step 
S312 corresponds to a step S603 in Fig. 25. On 
this occasion, both the scale-up/down ratios may 
well be set at either the lengthwise scale-up/down 
ratio a or the widthwise scale-up/down ratio &. It is 
also possible that the scale-up/down ratios are 
fixed to the value a or j3, thereby rendering the 
picture similar to the standard form of the part 
illustration. As a result, the drawing table 4000 is 
rewritten -as- shown 4n Fig. -tO-(S343) r -and the 
illustration edit program 12 re-displays an editing 
picture frame as shown in Fig. 11 (S314). 

In the case of entering a new input, as shown 
in Fig. 12 by way of example, the user gives 
commands "DRAW A BUILDING ABOUT THIS 
SIZE" through voice commands and gestures. The 
voice commands are received through the micro- 
phone B (step S101 in Fig. 20). When the user 
designates a display position and a size by draw- 
ing a curve D" on the touch panel 5 with a finger 
tip, a pen or the like, the acoustic recognition 
program 13 loaded, in the main storage 2 is started 
to convert the voices into a digital signal and to 
send the digital signal to the main storage 2 
(S102), and the feature extraction program 1301 is 
started to extract a feature vector from the digital 
signal (SI 03), in the same manner as in the fore- 
going case of the operation command "Scale-up". 

Meanwhile, the pointing area read program 14 
loaded in the main storage 2 is started by the 
information processor 1 at the same time that the 
acoustic recognition program 13 is started by the 
above method. Thus, the processing of the pro- 
gram 14 proceeds concurrently. At this time, vari- 
ables PO and Q which are set in the buffer area W 
of the main storage 2 are reset to zeros (steps 
S201 and S202 in Rg. 21), respectively. While the 
user is in contact with the touch panel 5 with, e. g., 
the finger tip or the pen ($203), the pointing area 
read program 14 accepts touched coordinates 
through the panel controller 3 at fixed time intervals 
(S204). The program 14 increments the variable P0 
each time it accepts the coordinates, and it writes 
the accepted x-coordinate into an array X[P0] in 
the pointing area table 204 (Rg. 8) of the main 
storage 2, the accepted y-coordinate into an array 
Y[P0], and the input time of the coordinates into an 
array T[P0] (S205). When a certain predetermined 
time period T 0 has lapsed since the release of the 
finger tip, the pen or the like from the touch panel 
5, the program 14 terminates the writing operation 
(S203). After the termination of the writing opera- 
tion, in order to decide which of a display object, 



an input position or a size or target point the 
entered coordinates designate, the program 14 
checks if tne difference value (Tp+ir " T m) °* * ne 
input times of the adjoining individual coordinate 

5 Nos., for example, coordinate No. i and coordinate 
No. (i + 1) stored in the pointing area table 204 is 
equal to or greater than a certain predetermined 
value T g (S206). When this condition is encoun- 
tered anywhere, the program 14 increments the 

10 variable Q and writes the coordinate No. i into an 
array Z in the main storage 2 (in the form of Z[Q] 
= Tpj) (S207). Such steps are iterated until the 
value i becomes greater than the value P0 (S208). 
When the user's pointing and vocalization have 

75 ended, Jhe information integration -program- 1 & load- 
ed in the main storage 2 is started. An array S in 
the main storage 2 is reset to zero (step S301 in 
Fig. 22), and the syntax check program 1500 is 
started (S302). Subsequently, in the same manner 

20 as in the case of the operation command "Scale- 
up", the matching between the feature vector ob- 
tained and the acoustic standard pattern data 16 is 
performed, with the result that the input voices are 
transformed into a character string (S303). Further, 

25 the character string is subjected to a morphemic 
analysis by the conventional method (S304), with 
the result that morphemic information such as 
"DRAW" (verb), "BUILDING" (pronoun, object 
name) and "THIS SIZE" (adverb phrase, size) are 

30 obtained. At the next step, the command extraction 
program 1502 is started (S305). The verb column 
191 in the command dictionary 19 shown in Rg. 19 
is referred to, and the verb "DRAW" of the mor- 
phemic information is collated therewith. As a re- 

35 suit, the verb "DRAW" 1913 (in the verb column 
191) is selected, and command No. "3" (indicated 
at numeral 1943 in the command No. column 193) 
is stored in the array S held in the main storage 2, 
in the form of S[0] = 3 (S306). Thus, a new input 

40 mode is established. Besides, the illustration parts 
dictionary 17 as shown in Fig. 13 is referred to. In 
this regard, any word (contained in the column of 
part names in the dictionary 17) other than stan- 
dard words (contained in the column of standard 

45 part name) is changed into the existent standard 
word corresponding thereto. In the example men- 
tioned here, the word "BUILDING" is the standard 
word and need not be changed. In due course, the 
part No. 5 ("BUILDING") of the bit map data 18 

so corresponding to the part No. of the illustration 
parts dictionary 17 is selected as the new input 
object (step S315 in Fig. 22). In this case, it is also 
possible to install, for example, an illustration parts 
dictionary with which a word is retrieved on the 

55 basis of close words or expressions by a concep- 
tual meaning network, as stated in Fujisawa: "Me- 
dia Space for Systemization of Conceptual Knowl- 
edge" (Computer Science, 2, 1, 1992). Subse- 
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quently, the scale-up/down ratio calculation pro- 
gram 1503 is started (S316). This program 1503 
calculates the ratio t» of the difference value (XMxO 
- XMnO) between the minimum value XMnO and 
maximum value XMxO within the X coordinate val- 
ues stored in the elements X[1] - X[Z(1)] of the 
array X in the main storage 2, relative to the 
standard maximum length XMD5 of the designated 
subject, and the ratio 0 of the difference value 
(YMxO - YMnO) between the minimum value YMnO 
and maximum value YMxO within the Y coordinate 
values stored in the elements Y[1] - Y[Z(1)] of the 
array Y in the main storage 2, relative to the 
standard maximum width YMD5 of the designated 
subject. 4n addition, - the - target -point calculation 
program 1504, the processing of which is exempli- 
fied in Fig. 26, is started (step S317 in Fig. 22). 
This program 1504 calculates coordinates (X nl , Y nt ) 
based on the mean value Xn t of the X values XMnO 
and XMxO and the mean value Y nt of the Y values 
YMnO and YMxO (step S701 in Fig. 26), and.it sets 
the coordinates (X ntl Y nl ) as the central point of the 
illustration part (S702). These steps S701 and S702 
correspond to a step S320 in Fig. 22. Subse- 
quently, the cells of the bit map are interpolated or 
thinned out in conformity with the calculated ratios 
a = 2.2 and 0 = 2.0, thereby scaling the picture 
up or down (to enlarge or reduce the size of the 
picture) (S312). On this occasion, both the scale- 
up/down ratios may well be set at either the length- 
wise scale-up/down ratio a or the widthwise scale- 
up/down ratio &. It is also possible that the scale- 
up/down ratios are fixed to the value a or 0, there- 
by rendering the picture similar to the standard 
form of the part illustration. As a result, the drawing 
table 4000 is written as shown in Fig. 14 (S313), 
and an editing picture frame as shown in Fig. 15 is 
displayed (S314). 

In case of moving an illustration part, as shown 
in Fig. 16 by way of example, the user gives 
commands "MOVE THIS HERE" through voices 
and gestures. The voices are received through the 
microphone 8 (step S101 in Fig. 20). When the 
user designates a subject to-be-handled and a tar- 
get point position by drawing curves D and D* on 
the touch panel 5 with a finger tip, a pen or the 
like, the acoustic recognition program 13 is started 
to convert the input voices into a digital signal 
(S102), and a feature vector is extracted ($103), in 
the same manner as in the foregoing case of the 
operation command "Scale-up". Besides, the point- 
ing area read program 14 is started to increment 
variables P0 and Q (steps S201 and S202 in Fig.. 
21), respectively. While the user is in contact with 
the touch panel 5 with, e. g., the finger tip or the 
pen ($203), the pointing area read program 14 
accepts touched coordinates through the panel 
controller 3 at fixed time intervals (S204). The 



program 14 writes the accepted x-coordinate and 
y-coordinate data and the input times thereof into 
the pointing- area table 204 (Fig.- 8) of the main 
storage 2, at the fixed time intervals from the 

5 coordinate No. "1" in the order in which they are 
entered (S205). When a certain predetermined time 
period T 0 has lapsed since the release of the finger 
tip, the pen or the like from the touch panel 5, the 
program 14— terminates -the- writing operation 

io (S203). After the termination of the writing opera- 
tion, in the same manner as in the foregoing case 
of the operation command "Scale-up", the program 
14 checks if the difference value (T P+1J - T$ of the 
input times of the adjoining individual coordinate 

75 Nos,, for example, coordinate- No. i and coordinate 
No. (i + 1) stored in the pointing area table 204 is 
equal to or greater than a certain predetermined 
value T g (S206). When this condition is encoun- 
tered anywhere, the program 14 increments the 

20 variable Q and writes the coordinate No. i into an 
array 2 in the main storage 2 (in the form of Z[Q] 
= T m) (S207). Such steps are iterated until the 
value i becomes greater than the value P0 (S208). 
When the user's pointing and vocalization have 

25 ended, the information integration program 15 load- 
ed in the main storage 2 is started. An array S is 
reset to zero (step S301 in Rg. 22), and the syntax 
check program 1500 is started (S302). Subsequent- 
ly, in the same manner as in the case of the 

30 operation command "Scale-up", the matching be- 
tween the feature vector obtained and the acoustic 
standard pattern data 16 is performed, with the 
result that the input voice commands are trans- 
formed into a character string (S303). Further, the 

35 character string is subjected to a morphemic analy- 
sis by the conventional method (S304), with the 
result that morphemic information such as "MOVE" 
(verb), "THIS" (pronoun, object name) and "HERE" 
(adverb, position) are obtained. At the next step, 

40 the command extraction program 1502 is started 
(S305). The command dictionary 19 shown in Fig. 
19 is referred to, and command No. "2" (indicated 
at numeral 1942 in the command No. column 193) 
corresponding to the verb "MOVE" 1912 (in the 

45 verb column 191) is stored in the array S held in 
the main storage 2, in the form of S[0] = 2 (S306 
in Fig. 22, and S401, S402 in Rg. 23). Thus, a 
movement mode is established. Next, word at- 
tribute Nos. "21" and "22" corresponding respec- 

50 tively to the extracted word attributes "Object 
name" and "Position" are stored in the array S in 
the order in which they are entered, in the forms of 
S[1] - 21 and S[2] = 22 (S307 in Rg. 22, and 
S403, S404 in Rg. 23). Subsequently, the object 

55 extraction program 1501 is started in the order in 
which the word attribute Nos. are entered (S308), 
and the "Car B" is decided as the designated 
subject to-be-handled through the same processing 
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as in the case of the operation command "Scale- 
up" (S309). In addition, the target point calculation 
program 1504 is started (S318). This program 1504 
sets coordinates (X nt , Y nt ) based on the mean value 
Xn, of the X values XMn1 and XMx1 and the mean 
value Y m of the Y values YMn1 and YMx1 , as the 
central point of the part illustration (S319 in Fig. 22, 
and S701, S702 in Fig. 26). As a result, the draw- 
ing table 4000 shown in Fig, 4 is rewritten as 
shown in Fig. 17 (S313), and an editing picture 
frame as shown in Fig. 18 is displayed (S314). 

Although, in the embodiments described 
above, the whole illustration part is treated as one 
unit, the present invention is not restricted to this 
aspect of performance.-By-way- of example,- in -the 
case of designating the window of a car, the sub- 
ject "window of a car" can also be recognized in 
such a way that the user vocalizes "window" while 
pointing to the vicinity of the "car" as a pointing 
gesture. To this end, the data of the constituent 
elements of the part illustrations may be stored in 
the illustration parts dictionary 17 and the bit map 
data 18. Besides, in a case where a plurality of 
objects of the same sort (e. g., two cars) are 
displayed, one of the cars being a subject to-be- 
handled can be recognized in the same way as in 
the aforementioned case that the user vocalizes 
"car" while pointing to the pertinent "car" as a 
pointing gesture. 

According to the present invention, a user des- 
ignates a subject to-be-entered or edited by a 
voice command input and thus makes an edit re- 
quest, while at the same time, he/she designates 
the subject, the display position thereof, the size 
thereof, etc. by pointing gestures, whereby the 
subject having the designated display position and 
the designated size can be displayed on a display 
screen quickly and easily. Moreover, owing to an 
illustration parts dictionary which stores therein 
necessary parts for drawing an illustration, and to a 
function by which a part is displayed at a des- 
ignated position and with a designated size when 
the name of the part is given by a voice input and 
the display position and the size are simultaneous- 
ly designated by pointing gestures, the user is 
freed from having to draw the part to-be-displayed 
through, e. g., the combination of basic patterns by 
himself/herself, and he/she is permitted to display 
the part on the display screen with ease. 

Claims 

1. A method of accepting multimedia operation 
commands wherein, while pointing to either of 
a display object and a display position on a 
display screen of a graphics display system 
through a pointing input device, a user com- 
mands the graphics display system to cause 



an event on a graphics display, through a 
voice input device; comprising: 

a first step- of allowing the user to perform 
a pointing gesture so as to enter a string of 
5 coordinate points which surround one area for 

either of the display object or any desired 
display position; 

a second step of allowing said user to give 
a voice command together with said pointing 
70 gesture; 

a third step of recognizing a command 
content of said voice command by a speech 
recognition process in response to said voice 
command; 

75 — a -fourth step-of recognizing a command 

content of said pointing gesture in accordance 
with a recognized result of said third step; and 
a fifth step of executing the event on the 
graphics display in accordance with the com- 

20 mand contents of said voice command and 
said pointing gesture. 

2. A method of accepting multimedia operation 
commands as defined in Claim 1 , wherein in a 

25 case where it has been recognized on the 
basis of the voice command content at said 
third step that the entered , string of coordinate 
points designate said display object as a des- 
ignated subject to-be-handled, and where a 

30 plurality of display objects which share com- 
mon areas with said area formed by said string 
of coordinate points are existent on the display 
screen, one of said display objects which has 
the largest common area with said area of said 

$5 string of coordinate points is determined as 
said designated subject at said fourth step. 

3. A method of accepting multimedia operation 
commands as defined in Claim 2, wherein in a 

40 case where a plurality of display objects none 
of which share common areas with said area 
formed by said string of coordinate points are 
existent, one of said display objects which has 
a central point nearest a coordinate point 

45 which is indicated by a mean value of maxi- 
mum and minimum X values of said string of 
coordinate points and that of maximum and 
minimum Y values thereof is determined as 
said designated subject at said fourth step. 

50 

4. A method of accepting multimedia operation 
commands as defined in Claim 1, wherein in a 
case where it has been recognized on the 
basis of the voice command content at said 

65 third step that the entered string of coordinate 
points designate said display position, a co- 
ordinate point which is indicated by a mean 
value of maximum and minimum X values of 
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said string of coordinate points and that of 
maximum and minimum Y values thereof is 
determined as the designated display position 
at said fourth step. 

5 

5. A method of accepting multimedia operation 
commands as defined in Claim 1, wherein in a 
case where it has been recognized on the 
basis of the voice command content at said 
third step that the entered string of coordinate 10 
points designate a size of the specified display 
object on the display screen after scale- 
up/down of said specified display object, a 

ratio a between a difference value of minimum 
and maximum -values within X coordinate -vah is 
ues of said entered string of coordinate points 
and a standard maximum length of said dis- 
play object and a ratio 0 between a difference 
value of minimum and maximum values within 
Y coordinate values and a standard maximum 20 
width of said display object are calculated at 
said fourth step, and sizes of said display 
object in X and Y directions thereof are re- 
spectively determined on the basis of the cal- 
culated ratios a and 0 at said fifth step. 25 

6. A method of accepting multimedia operation 
commands as defined in Claim 1 , wherein in a 
case where it has been recognized on the 
basis of the voice command content at said 00 
third step that the entered string of coordinate 
points designate a size of a display object 
which is to be newly entered, a ratio a be- 
tween a difference value of minimum and 
maximum values within X coordinate values of 35 
said entered string of coordinate points and a 
standard maximum length of said display ob- 
ject and a ratio 0 between a difference value of 
minimum and maximum values within Y co- 
ordinate values and a standard maximum width 40 
of said display object are calculated at said 
fourth step, and sizes of said display object in 

X and Y directions thereof are respectively 
determined on the basis of the calculated ra- 
tios a and & at said fifth step. 45 

7. A method of accepting multimedia operation 
commands as defined in Claim 5, wherein only 
one of said ratios a and 0 is calculated at said 
fourth step, and it is shared for determining so 
said sizes of said display object in both said X 

and Y directions thereof at said fifth step. 

8. A method of accepting multimedia operation 
commands as defined in Claim 6, wherein only 55 
one of said ratios a and 0 is calculated at said 
fourth step, and it is shared for determining 

said sizes of said display object in both said X 



and Y directions thereof at said fifth step. 

9. A method of accepting multimedia operation 
commands as defined in Claim 1, wherein a 
plurality of strings of coordinate points are 
respectively entered through pointing gestures 
at said first step, a plurality of voice com- 
mands are respectively given together with 
said pointing gestures at said second step* 
command contents of said voice commands 
are respectively recognized at said third step, 
a time interval between input times of the 
respectively adjoining individual coordinate 
points which constitute the entered string of 
coordinate- points is- -checked- at- said fourth 
step, thereby sensing a termination of one 
pointing gesture, and the sensed individual 
pointing gestures in a sequence of the sensing 
are respectively brought into correspondence 
with the individual voice command contents in 
a sequence of the recognition at said fifth step. 

10. A method of accepting multimedia operation 
commands as defined in Claim 1, wherein a 
part illustrations dictionary in which drawing 
data of parts required to be illustrated and 
names of the respective parts are stored is 
prepared before said first step, said user des- 
ignates the input position on the display screen 
at said first step, and said user designates the 
name of any desired part in said part illustra- 
tions dictionary through said voice command 
at said second step, whereby said command 
contents of said voice command and said 
pointing gesture are respectively recognized at 
said third and fourth steps, and the designated 
part is displayed at the display position and 
with a size on said display screen as des- 
ignated by said pointing gesture, at said fifth 
step. 

11. A display system which is commanded by a 
user to cause an event concerning a display 
object on a graphics display, by the use of a 
voice command and a pointing gesture; com- 
prising: 

pointing input means for entering a string 
of coordinate points which surround one area 
for either of the display object on the graphics 
display and a display position of said display 
object; 

a pointing area table which stores therein 
said string of coordinate points entered by said 
pointing input means; 

bit map data memory means for storing 
therein bit map data of various display parts 
that constitute said display object, and stan- 
dard maximum widths and standard maximum 
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lengths of said display parts; 

a drawing table which stores therein iden- 
tifiers of said display- parts selected from within 
said bit map data memory means and dis- 
played on said graphics display, widthwise and 5 
lengthwise scale-up/down ratios of said display 
parts relative to the standard maximum widths 
and lengths on said graphics display, and posi- 
tional information of said display parts; 

a display parts dictionary which holds to 
therein speech-recognizable names of the in- 
dividual display parts stored in said bit map 
data memory means; 

voice command input means for entering 
the voice-comraand of the-userr 15 

speech recognition means for recognizing 
said voice command entered by said voice 
command input means, with reference to said 
display parts dictionary; 

display parts extraction means for extract- 20 
ing said display parts on said graphics display 
as designated on the basis of said string of 
coordinate points in said pointing area table; 

target point calculation means for calculat- 
ing a target point designated on the basis of 25 
said string of coordinate points in said pointing 
area table; 

scale-up/down ratio calculation means for 
calculating the widthwise and lengthwise scale- 
up/down ratio information of said display parts 30 
on the basis of said string of coordinate points 
in said pointing area table; and 

control means for selectively activating at 
least one of said display parts extraction 
means, said target point calculation means and 35 
said scale-up/down ratio calculation means in 
accordance with a result of the speech rec- 
ognition, and for rewriting said drawing table 
on the basis of a result of the activating. 

40 

12. A display system as defined in Claim 11, 
wherein said voice command consists of a 
command content which expresses a sort of 
the event concerning said display object, and a 
command attribute which includes at least one 45 
of an object name, a size and said display 
position of said display object which are collat- 
eral with said command content; and said con- 
trol means selects and activates any of said 
display parts extraction means, said target so 
point calculation means and said scale- 
up/down ratio calculation means in accordance 

with said command attribute as the result of 
the recognition of said voice command. 

55 

13. A display system as defined in Claim 11, 
wherein in a case where a plurality of display 
objects which share common areas with said 



area formed by said string of coordinate points 
exist on a display screen of said display sys* 
- - tem ; said display parts extraction -means deter- 
mines one of said display objects which has 
the largest common area with said area of said 
string of coordinate points, as the designated 
subject to-be-handled. 

14. A display system as defined in Claim 13, 
wherein in a case where a plurality of display 
objects none of which share common areas 
with said area formed by said string of coordi- 
nate points exist, said display parts extraction 
means determines one of said display objects 
which has a central point nearest a coordinate 
point which is indicated by a mean value of 
maximum and minimum X values of said string 
of coordinate points and that of maximum and 
minimum Y values thereof, as said designated 
subject. 

15. A display system as defined in Claim 11, 
wherein said target point calculation means 
determines a coordinate point which is indi- 
cated by a mean value of maximum and mini- 
mum X values of said string of coordinate 
points and that of maximum and minimum Y 
values thereof, as the designated display posi- 
tion. 

16. A display system as defined in Claim 11, 
wherein said scale-up/down ratio calculation 
means calculates a ratio a between a differ- 
ence value of minimum and maximum values 
within X coordinate values of the entered string 
of coordinate points and a standard maximum 
length of said display object and a ratio 0 
between a difference value of. minimum and 
maximum values within Y coordinate values 
and a standard maximum width of said display 
object, and it determines sizes of said display 
object in X and Y directions thereof on the 
basis of the calculated ratios a and 0, respec- 
tively. 
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